
How to Make Wavelets 
Author(s): Robert S. Strichartz 

Source: The American Mathematical Monthly , Yol. 100, No. 6 (Tun. - Tul., 1993), pp. 539-556 
Published by: Mathematical Association of America 
Stable URL: http://www.jstor.org/stable/2324613 

Accessed: 30/03/2009 12:57 


Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at 
http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless 
you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you 
may use content in the JSTOR archive only for your personal, non-commercial use. 

Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at 
http ://www.j stor.org/action/sho wPublisher?publisherCode=maa. 

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed 
page of such transmission. 

JSTOR is a not-for-profit organization founded in 1995 to build trusted digital archives for scholarship. We work with the 
scholarly community to preserve their work and the materials they rely upon, and to build a common research platform that 
promotes the discovery and use of these resources. For more information about JSTOR, please contact support@jstor.org. 


Mathematical Association of America is collaborating with JSTOR to digitize, preserve and extend access to 
The American Mathematical Monthly. 


STOR 


http ://www.j stor.org 





How To Make Wavelets 


Robert S. Strichartz 


§1. INTRODUCTION. The French call them ondelettes , these new high-tech 
gadgets in the arsenal of harmonic analysis. Move over, Fourier! Your series and 
transforms are not the only game in town. Wavelet expansions enjoy a number of 
good properties not available in other types of expansions. To see this in the 
simplest context, consider a real-valued function f(x ) on the interval [0, 1]. You 
can expand it in a Fourier series 


00 


f(x) = b {) + cos27 rkx + a k smlirkx) 

l 

or you can expand it in a Haar function series 

00 V-\ 

(1.1) 


f(x) = c 0 + 

E E Cjk'l'i.Hx - k ) 

y = 0 k = 0 

(1.2) 

where ifj(x) is the function defined by 

( 1 if 0 < * < ^ 


< K x) = | 

(see Figure 1). 

1 — 1 if ^ <x < l 

{ 0 otherwise. 

(1.3) 



Figure 1 . The graph of the generator of the Haar functions. 


1993] 


HOW TO MAKE WAVELETS 


539 




Both series are examples of expansions in terms of orthogonal functions in 
L 2 (0, 1). Thus there are simple formulas for the coefficients. (Exercise: Show that 
{i/j(2 j x - k )} are orthogonal, but not normalized.) But the Fourier series is not 
well localized in space; if you are interested in the behavior of f(x) on a 
subinterval [a, b] you need to involve all the Fourier coefficients. On the other 
hand, the Haar series is very well localized in that to restrict attention to the 
subinterval [ a , b ] you need only take the sum in (1.2) over those indices for which 
the interval I jk = [2~ J k, 2~ j (k + 1)] (the support of i/K 2 j x - k)) intersects [a,b]. 
Furthermore, the partial sums of the Haar series (summing 0 < j < N) clearly 
represents an approximation to / taking into account details on the order of 
magnitude 2~ N or greater. These two properties, localization in space , and scaling , 
are the hallmarks of wavelet expansions. In addition, the Haar functions are 
created out of a single function i p by dyadic dilations and integer translations. 
Essentially the same property is shared by all the wavelet bases we will discuss, and 
may in fact be taken as an approximate definition of a wavelet expansion. 

The wavelet expansions we are going to construct can be thought of as 
generalizations of the Haar series, in which the function if/ is replaced by smoother 
cousins. Before we can say exactly what properties we want these functions to 
have, and how we can go about constructing them, it is useful to backtrack and see 
exactly how the Haar functions arise. It will turn out to be easier if we consider the 
whole line as the domain of our functions. 

§2. THE ROUGH-AND-READY HAAR WAVELETS. We begin with the function 
cp = characteristic function of the unit interval [0, 1]. Surely this is one of the 
simplest functions one can imagine, but it is chosen because it has two important 
properties: 

(i) the translates of cp by integers, cp(x — k\ k e Z, form an orthonormal set 
of functions for L 2 (R); 

(ii) cp is self -similar . If you cut the graph in half then each half can be expanded 
to recover the whole graph. This property can be expressed algebraically by the 
scaling identity 

cp(x) = cp( 2x) + cp(2x — 1). (2-1) 

We will call cp the scaling function. (In the French literature it is sometimes 
called “le pere” and ^ is called “la mere,” but this shows a scandalous misunder- 
standing of human reproduction; in fact the generation of wavelets more closely 
resembles the reproductive life style of an amoeba.) In fact, the scaling identity 
essentially determines cp up to a constant multiple (exercise). The significance of 
the scaling identity is the following: Let V 0 denote the linear span of the functions 
cp(x — k), ieZ (or by abuse of notation the closure in L 2 (R) of this span, 
£X=-oo a k cp{x - k) with H\a k \ 2 < oo). This is a natural space to consider in view of 
(i), since the functions cp(x — k) form an orthonormal basis for V 0 . Of course V 0 is 
not all of L 2 , it is the subspace of piecewise constant functions with jump 
discontinuities at Z. We can get a larger space by rescaling. Let (1/2)Z denote the 
lattice of half-integers k/ 2, teZ, and let V x denote the subspace of L 2 of 
piecewise constant functions with jumps at (1/2)Z. It is clear that f(x) e V 0 if and 
only if /( 2x) e V v and the functions 2 l/2 cp(2x — k) form an orthonormal basis for 
V x (the factor 2 1/2 is thrown in to make the normalization \\2 l/2 cp(2x - k ) || 2 = 1 
hold). The scaling identity (2.1), or rather its translated version 

cp(x — k) = cp(2x — 2k) + cp(2x — 2k — 1) (2.T) 
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says exactly V 0 c V v since a basis for V 0 Is explicitly represented as linear 
combinations of basis elements of V v (Of course the containment V 0 c V 1 is clear 
from the description of the spaces V 0 and V x in terms of locations of jump 
discontinuities, but in the generalizations to come there will be no such simple 
description; however, there will be a scaling identity.) 

The whole story can now be iterated, both up and down the dyadic scale. 
The result is an increasing sequence of subspaces Vj for j e Z, where Vj consists 
of the piecewise constant L 2 functions with jumps at 2~ J Z, and the' functions 
2 j/2 cp(2 J x — k) for k e Z form an orthonormal basis for Vj. We can pass back and 
forth among the space Vj by rescaling: f(x ) e J /. if and only if f(2 k ~ J x) e V k , and 
the scaling identity (2.1), suitably rescaled, says Vj c V k if j < k. The sequence {Vj} 
is an example of what is called a multiresolution analysis. There are two other 
properties of [V } ) that are significant, namely 

n*5-{0}, (2.2) 

ieZ 

and 

|J Vj is dense in L 2 (2.3) 

yez 

(exercise). 

In view of (2.3) it would seem tempting to try to combine all the orthonormal 
bases [2 j/2 (p(2 j x - k)} of 1 V. into one orthonormal basis for L 2 (U). But look, 
although Vj c V j+1 , the orthonormal basis {2 J/2 <p(2 J x — k )} for Vj is not contained 
in the orthonormal basis {2 u+1)/2 cp(2 j ^ 1 x — k)} for V j+v (Indeed, there are 
distinct elements in the two orthonormal bases that are not orthogonal to each 
other.) So our first naive attempt to obtain an orthonormal basis for L 2 (IR) is 
flawed. Can we fix it up? 

Back to the drawing boards! Since V 0 c V l and we have an orthonormal basis 
for V 0 of the form {cp(x — k )}, why don’t we try to complete an orthonormal basis 
of V x by adjoining functions of the form {^(jc - k)} for some function if/? This is 
the same thing as asking for an orthonormal basis of the desired form for the 
orthogonal complement of V 0 in V l9 which we denote W 0 , so V 1 = V 0 0 W 0 
(Hilbert space direct sum). 

The answer is easy: we want to take if/ exactly to be the Haar function generator 
defined in §1. Note that if/ can be expressed in terms of (p by 

ijr(x) = cp(2x) - cp( 2x — 1) (2-4) 

which is very reminiscent of the scaling identity. Exercise: show that {i/j(x - k)} 
forms an orthonormal basis for tV 0 . But now we can rescale the space W 0 , so 

V j+l = Vj ® Wj (2.5) 

and {2 J/2 ijj(2 J x - k)} k(El is an orthonormal basis for W jt If we combine conditions 
(2.2), (2.3) and (2.5) we obtain 


L 2 (U) = ® Wj (2.6) 

j= 

and since the spaces Wj are all mutually orthogonal we can now refine our naive 
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attempt and combine all the orthonormal bases for Wj into one grand orthonormal 
basis {2 J/2 ijj(2 J x - k ^ z for L 2 (U). (The only change is that we have 

replaced the scaling function <p by the wavelet if/.) This gives the Haar series basis 
for the whole line. There is a minor variation on this theme that is perhaps more 
closely related to the Haar expansion on the unit interval: instead of (2.6) we can 
also write 


L 2 (R) = V 0 © j © (2.6') 

and then combine the basis {cp(x - k)} k<EZ for V 0 with the bases {2 J/2 ifj(2 x/2 x - 
k)} k<E z for Wj with j > 0, to obtain an orthonormal basis for L 2 (R). 

§3. MULTIRESOLUTION ANALYSIS. The moral of the story so far is that we 
first want to build a scaling function <p and associated multiresolution analysis 
• • • c j c V 0 c V x c • • • before constructing the wavelets. 

Definition . A multiresolution analysis • • • Q V_ { Q V {) Q V { Q • • • with scaling 
function (p is an increasing sequence of subspaces of L 2 (R) satisfying the following 
four conditions: 

(i) (density) (J jVj is dense in L 2 (R), 

(ii) (separation) f| jVj = {0}, 

(iii) (scaling) f(x) /( 2~ j x) e V 0 

(iv) (orthonormality) {cp(x - k)} y(EZ is an orthonormal basis for V 0 . 

It follows easily from the definition that'{2 j/2 <p(2 j x - y)} yeZ forms an orthonor- 
mal basis for Vj. Since cp e V 0 c V x we must have 

<p(x) = E a(y)cp(2x - y) (3.1) 

yeZ 

for some coefficients a(y) satisfying 

E Kr)l 2 = 2 (3.2) 

yeZ 

and in fact 

a(y) = 2j cp(x)(p(2x — y) dx. (3-3) 

Equation (3.1) is the analogue of (2.1), and we will refer to it as the scaling identity. 

It follows from the definition that the scaling function determines the multireso- 
lution analysis, but not conversely. A more difficult question is how to characterize 
those functions <p which are scaling functions for a multiresolution analysis. Here 
we expect the scaling identity to play a crucial role, but before we can say more we 
need to examine certain algebraic conditions on the coefficients a(y) that follow 
from the definition. 

First, there is a consistency condition that arises from (iv) and (3.1). We know 
from (iv) that 


j <p(x - y)(p(x) dx = 8(y , 0) (3.4) 

(Kronecker 8). If we use (3.1) to substitute for cp(x - y) and cp(x) in (3.4) we 
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obtain 


E E «(r'Mr") / <p( 2 * -2y - y')<p{2x - y") dx 

y'eZ y"eZ 

= 2- x EE «(y>(7) = S(y,0) 

y" = 2y + y' 

after the change of variable x -> 2 _1 x and use of (3.4). We rewrite this as 

E a (Y) a (2y + y') = 2S(y,0). (3.5) 

y'eZ 

Note that (3.5) contains (3.2) as a special case. 

Another algebraic condition arises if we assume <p is integrable and fcp(x)dx ¥= 0 
(if fcp(x)dx = 0 then the same is true for all functions in all Vj, so we would not 
expect to have the density condition (i)). Then we integrate (3.1) and make a 
change of variable to obtain 

j <p(x) dx = Yj a (y) J ( p{^ x - y) dx 

= E «(y)2 _1 /<p(x) dx 

yeZ 

hence 

E «(y) = 2. (3.6) 

yeZ 

Now we would like to reverse the procedure. Step 1 will be to produce solutions 
a(y) to the algebraic identities (3.5) and (3.6). Step 2 will be to define the scaling 
function via the scaling identity (3.1). Notice that (3.1) says that cp is a fixed point 
of the linear transformation 

Sf(x) = E a(y)f(2x - y) (3.7) 

yeZ 

so it is reasonable to try to construct cp by iterating S, 

<p = lim S n f (3.8) 

n->oo 

for some reasonable initial function /. In a later section we will discuss another 
method for solving (3.1). Step 3 will be to prove that the function cp that solves 
(3.1) (normalized so ||<p|| 2 = 1) generates a multiresolution analysis. This is the 
trickiest step, because there are simple counterexamples to show that it is not 
always true (try a(y) equal to 1 for y = 0, 3, and otherwise a(y) = 0, and 
<p =*[ 0j 3 ], which violates (iv)). Nevertheless, many choices of a(y) do yield a 
multiresolution analysis. The difficult condition to verify is the orthonormality (iv), 
and we will have to postpone the discussion of when and why this holds to a later 
section. In Box 1 we will show how to establish the density (i) and separation (ii), 
given orthonormality and the additional normalization condition 

fcp(x)dx = 1. (3.9) 

Now we are ready to move on to Step 4, which is the construction of the 
wavelets themselves. 
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Proofs of Density and Separation 

Lemma Bl.l. Let V 0 be any subspace of L 2 (U) which is contained in L°°(U) 
and which has the property that 

ll/lloo <c||/|| 2 for all /g V 0 . (Bl.l) 

Define Vj by the scaling condition (Hi) (no assumption of the sort Vj c V j+[ is 
necessary ). Then ( ii ) holds. 

Proof: The scaling condition and a simple change of variable transforms 
(BL1) into 

ll/IU < cm j/2 \\f\\ 2 for all f e Vj. (B1.2) 

If / G nVj then (B1.2) holds for all j, and letting j -> — oo we obtain 
ll/IU = 0 hence / = 0. Q.E.D. 

The estimate (Bl.l) is easy to obtain in our case. For simplicity assume <p 
is bounded and has compact support, which will be the case in all our 
examples. Then by the orthonormality (iv) we have 

f(x) = £ <p(x -Y)[f(y)<p(y -y)dy= [K(x,y)f(y)dy 

•ye Z 

where K(x,y) = L yeZ <p(x - y)<p(y - y), so 

l/(*)l * (/|£(*,y)| 2 dyj II/II2 = | E \<p(* - r)| 2 ) II/II2 

and E ye/ I <p(x - y)| 2 is uniformly bounded (of course much weaker condi- 
tions on <p, such as rapid decrease will also imply this). 

Lemma B1.2. Assume cp has compact support and satisfies (3.1) and (3.9), and 
the orthonormality condition (iv). Then the density condition (i) holds. 


Sketch of Proof : Let Pjf(x) = 2 J H y ^ z (p(2 J x - y)Jf(y)<p(2 J y - y )dy denote 
the orthogonal projection onto Vj. We need to show limy^ Pjf = /inL 2 for all 
/ g L 2 , which is equivalent to lim^ J\Pjf\\l = II/II2 by the Pythagorean theo- 
rem. It suffices to prove this for / = Xa->A an y interval, by a density argument. 
But WPjxjl = 2% sZ f A cp(2’y - y ) dy 2 = 2~%^\ f±Acp(y - y) dy\\ For 
large j, 2 J A will be a large interval, so essentially either / 2 M<p(y — y) dy = 0 if 
y & 2 J A or f 2 iA<p(y - y)dy = 1 if y G 2 J A by (3.9) (for y in a small neighbor- 
hood of the boundary of 2 J A this is not quite correct, but in the limit we can 
ignore this detail). Thus \\PjXj$i ® 2~ J #{y g 2 j A} » length(y4) = WxaWi 
in the limit this becomes equality. Q.E.D. 

Notice that we could essentially reverse the argument to deduce the 
necessity of the normalization condition (3.9). 
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§4. THE WAVELETS. We will consider the scaling function <p to be the ;first 
element <p = of a pair of functions i/j 0 , i/j v with being the wavelet generator. 
We would like the functions {i// k (x — y)} y€ =z, * =0> i‘to be an orthonormal basis for 
V v Since the functions {cp( 2x - y)} yeZ already form an orthogonal basis for V v 
the functions i/j 0 (x) and iffx) must be linear combinations of cp(2x — y), so they 
must satisfy an identity 

<M» = E a k (y)<p(2x - y), k = 0,1 (4.1) 

yeZ 

which generalizes (3.1) (of course a 0 (y) = a( y)). Notice that for k = 1 (4.1) is an 
explicit formula, there is nothing to solve. But what kind of conditions should we 
put on the coefficients a k (y)l The same reasoning that led to (3.5) leads to 

E a y (y'K(2y + y') = 28{j, k)8{y,0). (4.2) 

yeZ 

On the other hand, the condition f<p(x) dx 0 is not something we can expect to 
hold for (think of the example of Haar functions), so conditions (3.6) can only 
be recopied in our new notation 


E «o(y) = 2. (4.3) 

yeZ 

Lemma 4.1. If {cp(x — y)} yeZ is an orthonormal set and if a j(y) satisfy (4.2) and 
(4.3) then {i/j k (x - y)} ye z, k= o,i ^ an orthonormal set. 

Proof: It suffices to show 

/ - y)dx = 8(j, k)8( y,0). (4.4) 


Now 

- y)dx= £ E aj(y')a k (y") j<p(2x - y')(p(2x - 2y - y")dx. 

y'eZ y" e Z 

But the integral is (l/2)5(y',2y - y") by the orthonormality of <p(x - y ) so (4.4) 
reduces to (4.2). Q.E.D. 


Remark. We have omitted the justification of the interchange of series and 
integrals, but in most of the examples we will look at the series are actually finite 
sums. 

Thus {i/s k (x - y)} yeZ , A:=o,i an orthonormal set of functions in V x . Is it a 
basis? (A kind of pseudo dimension counting argument makes this very plausible.) 
To show that it is a basis it suffices to represent each function <p(2x — y) as a 
linear combination, and we know the coefficients will have to be 

j<p{ 2x - y )ik k (x - y )dx = ’E a k(y') j ( P( 2x ~ y)<P( 2x “ 2 y - y')dx 

1 

= ~ 2 a k(y ~ 2 r)- 
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Thus we need to show that 


X £ £ a k (y - 2y)ijj k (x - y) (4.5) 

Z k = 0,1 yeZ 

is equal to <p( 2x - y). But if we substitute (4.1) into (4.5) we obtain 
£ (x £ £ + y) a k( 2 y' + r) J <p(2x - y) 

yeZ \ Z k = 0,\ y'eZ / 

so it suffices to show 


£ £ <**(2/ + fK(2y' + y) = 2S(y,y), (4.6) 

k = 0, 1 y'eZ 


for y = 0 or 1. 

Lemma 4.2. (4.6) tf/wAys 1 holds , /zence {(/^(x _ y)} y€ =r £=o i ^ orthonormal basis 
for V v 

Although this is a purely algebraic statement, we postpone the proof until the 
next section. 

Theorem 4.3. Suppose <p generates a multiresolution analysis and a k (y) 
satisfy (4.2) and (4.3) with i/f k defined by (4.1) and if/ 0 = <p. Then the functions 
{2 j/2 i/j 1 (2 j x — y)} for j e Z, y e Z form an orthonormal basis of L 2 (U). 

Proof: As before, let W 0 denote the orthogonal complement of V 0 in V u V x = 
V 0 © W 0 . We claim {tyfx - y)} yeZ is an orthonormal basis for W 0 . This follows 
because we have merely taken the basis for V x given by Lemma 4.2 and removed 
{i// 0 (x - y)} yeZ which is a basis for V 0 . By scaling we obtain 

Vj+i = Vj<BWj 


and 


- y)} yez 

is an orthonormal basis for Wj. But 

L 2 (K) = © Wj 

yez 


by the density condition. 


Q.E.D. 


As a simple variation on the theme, which we leave as an exercise to the reader, 
the set of functions {cp(x - y)} for y e Z together with [2 i/2 ijjf2 j x — y)} for 
j > 0, y e Z form an orthonormal basis of L 2 (R). The advantage of this variant is 
that we scale only to finer and finer resolutions (j +oo) and take care of all the 
coarser resolutions (j < 0) by the single family {<p(x - y)} yeZ . 

In summary, we have reduced the construction of wavelets to the solution of the 
algebraic identities (4.2) and (4.3), modulo some technical conditions to ensure the 
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orthonormality condition (iv). Step 5 will be to actually produce the solutions to 
(4.2) and (4.3), and Step 6 will be to establish various properties of the wavelet 
functions: regularity, decay at infinity, and moment conditions. 

The reason we have postponed some of the details in the construction so far is 
that they require a new technique. So it is now time to open the door and invite 
Fourier back in. 

§5. THE VIEW FROM THE FOURIER TRANSFORM SIDE. Suppose we take the 
Fourier transform of everything in sight. Because most of our identities have a 
convolutional structure, we expect a simplification, with multiplicative identities 
arising in their place. Before doing so, let us return to the orthonormality question, 
because here the Fourier transform viewpoint gives us an entirely new handle on 
the problem. Given <p e L 2 , how can we tell from <p whether or not {cp(x - y)} yeZ 
is orthonormal? 

It will simplify matters if we adapt the convention (as in [SW]) that 

<p(x) = j e 2vixy <p(y) dy (5.1) 

so that the Fourier inversion formula is just 

$(x ) = <p(-x) (5.2) 

and the Plancherel formula is 

IMl2 = IMI 2 (5.3) 

(warning: not all the references follow this convention!). 

Lemma 5.1. {cp(x - y)} y€EZ an orthonormal set if and only if 

£ |<p(£ + y)| 2 = l for am. (5.4) 

y£Z 

Proof: By the Plancherel formula, {<p(x - y)} ye2 is orthonormal if and only if 

fe 2 ^\<p(t)\ 2 d£ = 8(y,0). (5.5) 

But the integral over R can be broken up into an integral over [0, 1] and a sum over 
Z. Since e 2irliy is periodic we obtain 

f l e 2 ^y £ |y(£ + y)\ 2 d£ = S(y,0) 
o y gZ 

which means that the function E yeZ l<p(£ + y ) 1 2 on [0,1] has as Fourier coeffi- 
cients 8(y, 0), hence must be the constant function given by (5.4). Q.E.D. 

Now the scaling identity (4.1) transcribes easily into the condition 

=A k(i£)Hy) ( 5 - 6 ) 

where 

Mf) = \ E a k {y)e 2 ^ (5.7) 

Z yeZ 
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(exercise, using the definition of the Fourier transform and a change of variable). 
Notice that A k (f;) is smooth and periodic. Then (4.3) says 

^o(O) = 1 (5.8) 


and (3.9) says 


m = i. 


(5.9) 


By iterating (5.6) for k = 1 (remember i/^ 0 = cp) we obtain the infinite product 
representation 


HO = UM 2 ~ j O 


y=i 


(5.10) 


(using (5.8) we can justify the local uniform convergence of the infinite product). 
Substituting (5.10) back into (5.6) we obtain 

oo 

(5.11) 

;=2 


Thus the functions A k completely and explicitly determine the wavelets. 

The most intricate part of the transcription process is the identity (4.2) that the 
coefficients a k (y) must satisfy. What does this tell us about the functions A k ? 
Rather than deal with this question directly (try it as an exercise, after the fact) we 
repeat the process which led to (4.2) — namely the consistency of (4.1), alias (5.6), 
with the orthonormality, alias (5.4). In other words, if {cp( x ~ y)} ye z ls orthonor- 
mal then (5.4) must hold, and if (5.6) defines ip k then we want the analogue of 
(5.4), namely 

E 'i'kU + y)^(£ + y) = V (5.12) 

yeZ 

Now let 7 j 1 = 0 and ij 2 = 1/2. These are representations of the cosets of the 
subgroup Z in (1 /2)Z. Then points of the lattice Z can be represented uniquely as 
2(y + 7j p ) as y varies in Z and p = 1,2. Then 

L + r) = E E <£*(£ + 2 (r + + 2 (r + v P )) 

yeZ p = 1 yeZ 

by the above parametrization of Z, and if we substitute (5.6) and use the 
periodicity of A k we obtain 

E + v P )Aj(& + v p ) E | + i P + r)| 2 - 

p = 1 yeZ 

The inner sum over Z yields the constant 1, and so (5.12) yields the consistency 
condition 

E A k(€ + V P )Aj(^ + v p ) = 8 jk . (5.13) 

p = i 

This is the Fourier transform equivalent of (4.2). Note that (5.13) implies 

KU)l^l (5-14) 

which implies the boundedness of the Fourier transforms ij/ k . 
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We can now easily supply the missing proof of Lemma 4.2. Notice that (5.13) 
says that for every the 2x2 matrix {A k {^ + rj p )} is unitary by rows. But 
this is equivalent to being unitary by columns, 

E M* + %)M£ + %) = V (B2.i) 

* = 0,1 

Now substituting (5.7) into (B2.1) we obtain 

7 1 . \ 

E 7 E E a*(r' + r)a /t (r')e 2 ’ r,y ’ , "e 2 ’ r ' 7 ' ( > = 8 pq . 

yeZ \ /c = 0,l -y'GZ / 

Regarding this as an identity between Fourier series expansions we can 
equate coefficients to conclude 

1 

T E E «*(y + y)a k (y , )e M ^e M ^~^ = S p<? S(y,0). 

4 k — Q, 1 y'6Z 


Choosing 7j p = 0 and summing over q we obtain (4.6) for y = 0 since 

„,_/2 ify'e2Z 

^ =1 \0 otherwise. 

Similarly, choosing 77 ^ = 1/2, multiplying by e 27ni7 " and summing over q we 
obtain (4.6) for y = 1 . 


The time has- come to grasp the bull by the horns and prove the orthonormality 
of {cp(x - y)} yeZ directly. For this we will need an additional hypothesis. 

Theorem 5.2. Suppose 

A 0 (€)*0 for |£| < (5.15) 

Then {cp(x - y)} y€EZ is orthonormal. 

Proof: We construct a sequence of functions cp j such that {<pj(x - y)} yeZ is 
orthonormal, and such that cpj — » <p in L 2 norm as j — » oo. For cp 0 we simply take 
<P 0 (f) = X[- 1 / 2 , i/ 2 ]^)- Then {<p 0 (£ - y)} yeZ * s orthonormal by Lemma 5.1 because 
(5.4) has exactly one non-zero term. 

Inductively define functions cpj by 

<Pi(0 =A 0 (±g)<pj_ i(if). (5.16) 

We claim that {cpj(x - y)} y€EZ is again orthonormal. This follows immediately from 
(5.13) with j = k = 0 and Lemma 5.1. It can also be deduced from 

<Pj( x ) = E a 0 (y)<Pj-i(2x - y) (5.17) 

y£Z 

which is the non-Fourier transform version of (5.16), and (4.2). Note that 

<PjW = (5.18) 

so that (f j — > ip pointwise, by ( 5 . 10 ). 
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We would like to show cpj -> cp in L 2 norm. This will suffice to complete the 
proof, because the norm limit of orthonormal sets is an orthonormal set. This is 
the key point of the proof, where the non-vanishing hypothesis must be used. (As 
an interesting exercise, see how the argument breaks down for the counterexample 
given in §3.) 

By the Plancherel formula it suffices to show cpj -> <p in L 2 norm, and since we 
have pointwise convergence we would like to use the dominated convergence 
theorem. Note first that <p e L 2 by Fatou’s theorem, since it is the pointwise limit 
of cpj and 1 1 cpj || 2 = 1. Thus we can use a multiple of <p as a dominator. By 
comparing (5.18) and (5.10) we see 


W) - 


( H€) 

*( 2-'0 


if Ifl Z 2'- 1 
otherwise. 


(5.19) 


We claim that $ is bounded from below on [ - 1 /2, 1 /2]. The point is that <p is 
continuous, and by (5.15) A 0 (2~ j ^) ¥= 0 for |£| < 1/2. Thus (p doesn’t vanish on 
[-1/2, 1/2], so |<p/£)| ^ c|<p(£)| for c = Q.E.D. 


§6. THE RECIPE. So now we have indicated all the major steps in the construc- 
tion, but we have left the first to last. We need to find actual solutions to the 
algebraic identities (5.8), (5.13) and (5.15). There are several different approaches 
to this problem. We describe one that is due to Ingrid Daubechies [Dl], 

We look for solutions with only a finite number of a k (y) different from zero, 
which means A k (g) are trigonometric polynomials. This implies that the scaling 
function cp and wavelet i/j 1 have compact support. This can be seen most easily 
from the iteration procedure (3.7) and (3.8). Say a(y) = 0 unless y e [0, N]\ then 
if / has support in [0, N ], so does Sf. 

We concentrate first on finding the function A 0 , which must satisfy three 
conditions: 

MO) = 1 ( 6 . 1 ) 

MoU)| 2 +MoOF+i)f=l (6.2) 

A 0 (f) ± 0 for |£| < \ (6.3) 

(here (6.1) is (5.8), (6.2) is (5.13) for / = k = 0 and (6.3) is (5.15)). And, of course, 

Aq must be of the form 

A 0 (O = -£ a 0 (y)e 2wi ^ (finite sum). (6.4) 

Z yeZ 

Note that \A 0 (g)\ 2 is then of the same form. 

Now we already know one solution, namely 

A 0 (£) = ^(1 + e 27ri *) = e ni * cos 7 t£ 

which yields the Haar wavelets. This was deemed unsatisfactory because the 
wavelets are not continuous. One way to create continuity and even differentiabil- 
ity is to take convolution powers, or on the Fourier transform side to take ordinary 
powers. Thus we are tempted to try A 0 ( £) = (e nl ^ cos 7r£)^ for some large N. 
Unfortunately (6.2) no longer holds, but we can fix this up. Note that cos 7 t(£ + 
1/2) = -sin 77 ^, so that is why |cos7t£| 2 + |cos7t(£ + 1 /2) | 2 = 1. 
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Now take the identity cos 2 77 £ + sin 2 77 £ = 1 and raise it to an odd power, say 
1 = (cos 2 77 £ + sin 2 7 r£) 5 

= cos 10 7r£ + 5 cos 8 7 sin 2 77 £ + 10 cos 6 77 £ sin 4 77 £ 

+ 10 cos 4 77 £ sin 6 77 £ + 5 cos 2 77 £ sin 8 77 £ + sin 10 77 ^. 

Take the first half of the terms for \A 0 1 2 , 

|^ 4 0 (f ) | 2 = cos 10 77^ + 5 cos 8 77 £ sin 2 77 £ + 10 cos 6 77 £ sin 4 77^. (6.5) 

Replacing £ by £ + 1/2 turns these into the second half of the terms, so (6.2) is 
automatic, and (6.1) and (6.3) are easy. This gives a recipe for producing \A 0 1 2 , and 
it remains to take a square root of the form (6.4). We would also like to take the 
coefficients a 0 (y) in (6.4) to be real, for that will yield a real-valued scaling 
function (and in the end real-valued wavelets as well). There is a general theorem 
of F. Riesz that asserts that this is possible, but in this case it is easy enough to 
accomplish by trial and error. Since 

|^ 0 (f)| 2 = cos 6 77 £ (cos 4 77 £ + 5 cos 2 77 £ sin 2 77 £ + 10sin 4 77£) 

= cos 6 77^((cos 2 77 £ — v^To sin 2 77 ^ ) 2 + (5 + 2]/l0 )cos 2 77 £ sin 2 77 ^ j 

we can take 


A)(£) = (e^cos 77^) 3 ^cos 2 77 £ - vTo sin 2 77 £ + i]/5 4- cos 77 £ sin 77 £ j 


= g(^ + i r 


1 — 1/10 1 + /To 


+ 


( e 27rlx 4- e~ 27rlx ) 


2 

+ — V^5 + 2VW (e 2uix - e~ 2irix ) 


( 6 . 6 ) 


which is clearly of the form (6.4) with a 0 ( y) real and a 0 (y) # 0 only if - 1 < y < 4. 

To complete the story we need to find A x {%), also of the form (6.4), which 
satisfies 


M 1 (Ol 2 +M,U + i)| 2 = i (6.7) 

and 

A o(£) A \(£) + ^o(f + + i) = 0 (6.8) 

(these are the remaining conditions of (5.13)). Fortunately, this can be accom- 
plished just by taking 

MO = e2vlf A 0 (i + I) (6.9) 

which amounts to setting 

«i(r) = (- 1 ) y+1 «o( 1 - ?)• (6.10) 

Then (6.7) and (6.8) follow directly from (6.2) and the periodicity of A 0 . Note also 
that a x (y) are real valued if a 0 ( y) are. 

The Fourier transform of is given by (5.11), which now reads 

00 

Uo=Mi0UM2- J O (6.ii) 

J = 2 

with A 0 given by (6.6) and A x by (6.9). If we want to obtain the wavelet \jj l itself 
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rather than its Fourier transform we first find t/r 0 = <p by iterating the mapping 

Sf(x) = L«o(y)/(2* - y) (6.12) 

starting with any reasonable / satisfying ff(x)dx = 1, and then setting 

<Ai(*) = E^i(y)^( 2 ^ - y)« (6.13) 

y 

See Figures 2 and 3. 




Figure 3. The graph of the wavelet generator if/ v courtesy of David Aronstein. 
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There is an alternative approach to constructing the scaling function that 
yields a different wavelet basis. It has the advantage of requiring less algebra, 
but the disadvantage of producing wavelets that are not compactly supported. 
Start with the Haar basis scaling function * [0> X] , whose Fourier transform is 
e^isin wg/irt;), and take the N-fold convolution product 

8 = *[ 0 , 1 ] * X[ 0 , !]*•••* *[ 0 , 1 ] ( N factors) 


so that 


8(0 



sin 7 rg 
tt£ 


N 


It is easy to see that g e C N ~ 1 , but of course we have destroyed the 
orthonormality of translates by Z that ^ [01] had. Too bad, but this is easily 
fixed. Write 


HO = ( L ls(£ + £)l 2 ) 

and observe that h is periodic and 

0 < c 1 < h(g) < c 2 < oo. 
Then we have only to take 


HO =i(0 /HO 

and (5.4) is automatic, so we have the orthonormality of { cp( x ~ y)) 7 ez- 
Notice that g(0) = 1 and g(y) = 0 for y + 0 so <p(0) — 1 as required. And it 
is not difficult to show that <p e C N ~\ 

What about the scaling identity? Well, it certainly holds for g, namely 

HO =B(t/2)g(t/2) 

where 


£(£) = (e 7 rii cosvO N 

has the required form (6.4). It then follows that 

HO -A 0 (£/2)H£/2) 

where 

A 0 (o=B(OHO/H 20- 

Now A 0 is periodic, so it must have the form (6.4), but the sum is no longer 
finite. This is where we lose the compact support of <p. On the other hand A 0 
is clearly smooth, so the Fourier coefficients in (6.4) must be rapidly decreas- 
ing, which implies that <p is rapidly decreasing. 

The construction of /1,(£) and the wavelet Fourier transform (//,(£) then 
proceeds via (6.9) and (6.11) as before. 
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§7. SMOOTHNESS OF WAVELETS. How smooth are our wavelets? Since we 
understand them best on the Fourier transform side, we will use the principle that 
decay at infinity of <p implies smoothness of <p (we will establish smoothness of the 
scaling function and pass it on to the wavelets via (6.13)). For example, it is easy to 
show 

\9(€)\*c(1 + |£| y N ~ l - e (7.1) 

implies <p e C N . So how do we establish (7.1)? 

We have the infinite product representation (5.10) which says 


9(f) = flA 0 (2- k €) 


k = 1 


(7.2) 


and A 0 is periodic. Since each factor does not decay at infinity, why should the 
product? This is a mystery, which is best solved by looking at the simplest case, 
A 0 (g) = cos 7 r£. Then 


Yl COS 2 k 7T^ 


k = l 


sin 77 £ 

7r£ 


(7.3) 


does decay at the rate 0(|£| -1 ). (Formula (7.3) was proved by Euler, but special 
cases were known by Francois Viete in the late 1500’s. You can prove it by 
considering the Fourier transform of X[-i/ 2 ,i/ 2 ] an d its scaling properties.) 

Clearly, for most choices of £, the values of cos2 _/c 7t£ will occasionally become 
small, and that makes the product (7.3) small. You might try to get around this by 
taking £ = 2 N for large N. Thus cos 2~ k ir £ = + 1 for k = 1, . . . , N, so there is no 
decay, but then cos2 _Ar_1 7r£ = 0 wipes you out. You can try to quantify this 
line of reasoning, but there is no great payoff in showing, for example, that 
sin 77 ^/ 77 ^ = 0 ( If [ _2/3 ), so we will take (7.3) as our starting point. 

The expression (6.6) for A 0 , or any of its more complicated cousins, contains 
cos 77 £ as a factor, many times. Thus <£>(£) contains sin 7r£/£ as a factor many 
times, hence we expect decay. Unfortunately, the other factor grows. It is easier to 
work with \A 0 \ 2 given by (6.5), if we remember to take the square root at the end. 
We have, for the special case considered, 


|^4 0 (f ) | 2 = ( cos ) 6 (cos 4 77^ + 5 cos 2 77 £ sin 2 77 ^ + 10 sin 4 77 ^). 

The first factor produces decay 0 ( |£| -6 ). The second factor can be written 
1 + 3 sin 2 77^ + 6 sin 4 77 £ so it clearly has a maximum value 10 at £ = 1 /2. We can 
obtain a crude estimate for the growth rate produced by the second factor by the 
following reasoning: if |£| * 2 N then there will be about N factors where 2~ k \£\ is 
large, so an upper bound for the product is a constant times 10^. But 10^ « \{j\ a 
for a = log 10/log 2 « 3.32. So the growth rate is at most CK|£| 3 ' 32 ) so the 
combination gives 0 ( |£| -2 - 68 ) for \<p(0\ 2 hence 0(\g\ ~ 1,34 ) for <p(£). 

This is a disappointing estimate. According to (7.1) it suffices only to show that 
<p is continuous. It can be improved, but not by a lot. To see why, consider 
£ = 2^/3. Then for each of the N factors 2~ k £ = 2 N ~ k /3, 1 < k < N, we have 
1 + 3 sin 2 2^77/3 + 6 sin 4 2^77/3 = 1 + 3 • (\/3/2) 2 + 6(VJ /2) 4 = 6.625 so 
a lower bound for a is log 6.625 /log 2 which yields 0(|£| -1,636 ) as the optimal 
improvement. 

If we consider the family of wavelets constructed as outlined in §6, we will have 
\A 0 (£)| 2 written as the product of higher and higher powers of cos 77 £ by more and 
more complicated second factors. Thus we have faster decay times faster growth in 
<£(£). Which wins? Well, it is a close race! It turns out that the decay wins, but the 
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Figure 4. The graph of <p, after factoring out a power of sin ttx/ttx , courtesy of Prem Janardhan and 
David Rosenblum. 


crude method of estimating the growth used above is not good enough to show 
this. The final result ([Dl], [C2]) is that to create wavelets of class C N we need to 
carry out the construction starting with (cos 2 7 + sin 2 tt£) m = 1 for M on the 
order of 5 (N + 1). This means that there is a rather high price to pay in terms of 
complexity (the algebra required to pass from \A 0 \ 2 to A 0 , for example) in order 
to gain a moderate amount of smoothness. (More recently, better techniques have 
been found to estimate the smoothness directly, without involving the Fourier 
transform [DL].) Figure 4 shows the graph of <£(£). See [JRS] for a discussion of 
the surprising self-similarity properties of this function. 

In addition to smoothness, another important property of wavelets is the 
vanishing moment conditions 

f x k i/j 1 (x) dx = 0, k = 0,1,..., AT (7.4) 

^ — CO 

which are equivalent to the vanishing of the Fourier transform to high order at the 
origin, 

(i?)Vi(0)=0, *-0,1,..., AT. (7.5) 

In contrast to smoothness, however, it is only the wavelet, not the scaling function, 
which enjoys this property. The significance of this condition is that it implies a 
weak form of localization in the frequency (Fourier transform) variable, since the 
Fourier transform of */q( Vx - k ) is mainly concentrated around values of |£| on 
the order of V. (There is yet another family of wavelets in which the Fourier 
transform is actually supported in an annular region c x V < |£| < c 2 2 j . See [M] 
for a description of these “Littlewood-Paley” type wavelets.) For our wavelets the 
verification of (7.5) is easy. From (6.11) we see that t/q has a factor A ^(1/2)^), 
and from (6.9) we see that A 1 at £ = 0 has the same order zero as A 0 at £ = 1/2. 
But A 0 has a factor of cos to a power, hence vanishes at £ = 1/2 to order 3 in 
our particular example, and to order M if we start with (cos 2 7 tx + sin 2 7 tx) m = 1 
in our construction. Note that in general conditions (6.1) and (6.2) imply that 
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A 0 (l/2) = 0, and the flatter we make A 0 near £ = 0, the more it vanishes near 
€ = 1 / 2 . 

§8. CONCLUDING REMARKS. Why not try to create your own designer wavelets 
by programming the recipe given in §6, and taking the square root of |^4 0 (£)| 2 in a 
different way? For a more detailed discussion of the Riesz Lemma for doing this 
see [DI]. 

For further information about wavelets, including historic accounts and attribu- 
tion of results, see the books [M], [BF], [BC] or the expository lectures [D2] and 
[FJW]. The term “wavelet” is also used to describe expansions in terms of 
functions which are not orthogonal. These wavelets have a simpler algebraic 
description, which is useful for some applications. An expanded version of this 
article, including a discussion of wavelet bases in several variables, will appear in 
[BF]. None of the theorems or proofs presented here are original; I have only tried 
to organize the material in a way that is easy to digest. 
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